{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kxu1Gfhx1pHg"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_11_3_code.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZbNbAV281pHh"
   },
   "source": [
    "# T81-559: Applications of Generative Artificial Intelligence\n",
    "**Module 11: Finetuning**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HwnvSYEQ1pHi"
   },
   "source": [
    "# Module 11 Material\n",
    "\n",
    "Module 11: Finetuning\n",
    "\n",
    "* Part 11.1: Understanding Finetuning [[Video]](https://www.youtube.com/watch?v=fflySydZABM) [[Notebook]](t81_559_class_11_1_finetune.ipynb)\n",
    "* Part 11.2: Finetuning from the Dashboard [[Video]](https://www.youtube.com/watch?v=RIJj1QLk-V4) [[Notebook]](t81_559_class_11_2_dashboard.ipynb)\n",
    "* **Part 11.3: Finetuning from Code** [[Video]](https://www.youtube.com/watch?v=29tUrxrneOs) [[Notebook]](t81_559_class_11_3_code.ipynb)\n",
    "* Part 11.4: Evaluating your Model [[Video]](https://www.youtube.com/watch?v=MrwFSG4PWUY) [[Notebook]](t81_559_class_11_4_eval.ipynb)\n",
    "* Part 11.5: Finetuning for Text to Image [[Video]](https://www.youtube.com/watch?v=G_FYFSzkB5Y&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_559_class_11_5_image.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XLFov09h18yC"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running and maps Google Drive if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "AWGARRT92DrA"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "try:\n",
    "    from google.colab import drive, userdata\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# OpenAI Secrets\n",
    "if COLAB:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
    "\n",
    "# Install needed libraries in CoLab\n",
    "if COLAB:\n",
    "    !pip install langchain openai streamlit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "v2MPPX0c1pHi"
   },
   "source": [
    "# Part 11.3: Finetuning from Code\n",
    "\n",
    "Just like the last part we will utilize the following training data.\n",
    "\n",
    "* [sarcastic.jsonl](https://data.heatonresearch.com/data/t81-559/finetune/sarcastic.jsonl)\n",
    "* [sarcastic_val.jsonl](https://data.heatonresearch.com/data/t81-559/finetune/sarcastic_val.jsonl)\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OawqUP3Jr85z"
   },
   "outputs": [],
   "source": [
    "from openai import OpenAI\n",
    "\n",
    "!wget https://data.heatonresearch.com/data/t81-559/finetune/sarcastic.jsonl\n",
    "!wget https://data.heatonresearch.com/data/t81-559/finetune/sarcastic_val.jsonl\n",
    "\n",
    "client = OpenAI()\n",
    "\n",
    "obj = client.files.create(\n",
    "  file=open(\"sarcastic.jsonl\", \"rb\"),\n",
    "  purpose=\"fine-tune\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7TZeEIbtyXxw"
   },
   "outputs": [],
   "source": [
    "obj.id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kQEFmRLzHCQH"
   },
   "outputs": [],
   "source": [
    "status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "slJEgf6XiFDw"
   },
   "source": [
    "### Run and Monitor Finetuning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MSgK0cLbfDZA"
   },
   "outputs": [],
   "source": [
    "import time\n",
    "from openai import OpenAI\n",
    "\n",
    "# Start the fine-tuning job\n",
    "train = client.fine_tuning.jobs.create(\n",
    "    training_file=obj.id,\n",
    "    model=\"gpt-4o-mini-2024-07-18\"\n",
    ")\n",
    "\n",
    "done = False\n",
    "\n",
    "# Initialize a set to store processed event IDs\n",
    "processed_event_ids = set()\n",
    "\n",
    "while not done:\n",
    "    # Retrieve the latest status of the fine-tuning job\n",
    "    status = client.fine_tuning.jobs.retrieve(train.id)\n",
    "    print(f\"Job status: {status.status}\")\n",
    "\n",
    "    # Fetch all events related to the fine-tuning job\n",
    "    events = client.fine_tuning.jobs.list_events(fine_tuning_job_id=train.id)\n",
    "\n",
    "    # Collect new events that haven't been processed yet\n",
    "    new_events = []\n",
    "    for event in events:\n",
    "        if event.id not in processed_event_ids:\n",
    "            new_events.append(event)\n",
    "            processed_event_ids.add(event.id)\n",
    "\n",
    "    # Sort the new events in chronological order\n",
    "    new_events.sort(key=lambda e: e.created_at)\n",
    "\n",
    "    # Display the new events in order\n",
    "    for event in new_events:\n",
    "        print(f\"{event.created_at}: {event.message}\")\n",
    "\n",
    "    if status.status == \"succeeded\":\n",
    "        done = True\n",
    "        print(\"Done!\")\n",
    "    elif status.status == \"failed\":\n",
    "        done = True\n",
    "        print(\"Failed!\")\n",
    "    else:\n",
    "        print(\"Waiting for updates...\")\n",
    "        time.sleep(20)  # Sleep for 20 seconds\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "aRVAzHzYjT8T"
   },
   "outputs": [],
   "source": [
    "model_id = status.fine_tuned_model\n",
    "print(f\"Trained model id: {model_id}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xqdT0CyTiq2x"
   },
   "source": [
    "### Test the Finetuned Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "l5Eeuf3JiT6O"
   },
   "outputs": [],
   "source": [
    "from openai import OpenAI\n",
    "client = OpenAI()\n",
    "\n",
    "completion = client.chat.completions.create(\n",
    "  model=model_id,\n",
    "  messages=[\n",
    "    {\"role\": \"system\", \"content\": \"Marv is a factual chatbot that is also sarcastic.\"},\n",
    "    {\"role\": \"user\", \"content\": \"What is the capital of the USA?\"}\n",
    "  ]\n",
    ")\n",
    "print(completion.choices[0].message)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ukvYwN20ilos"
   },
   "source": [
    "### Delete Old Models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "gPTpmdYbiOxp"
   },
   "outputs": [],
   "source": [
    "#client.models.delete(\"ft:gpt-4o-mini-2024-07-18:personal:sarcastic:A9yCtR0b\")"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
